Goto

Collaborating Authors

 champion model


Model Drift in Machine Learning – How To Handle It In Big Data - KDnuggets

#artificialintelligence

The Rendezvous architecture proposed by Ted Dunning and Ellen Friedman in their book on Machine Learning Logistics was a wonderful solution I found for a specific architectural problem I was working on. I was looking for a tried and tested design pattern or architectural pattern that helps me run Challenger and Champion models together in a maintainable and supportable way. The rendezvous architecture was significantly more useful in the big data world because you are dealing with heavy data and large pipelines. The ability to run Challenger and Champion models together on all data is a very genuine need in machine learning, where the model performance can drift over time and where you want to keep improving on the performance of your models to something better always. So, before I delve deeper into this architecture, I would like to clarify some of the jargon I have used above.


How to make sure AI and ML models survive and thrive

#artificialintelligence

An urban legend says that a data science task is mainly finished after the development of models. The truth is that a much more important phase follows, often tougher than model development: managing and governing these ready-to-use models to keep your data science project relevant for the long haul. If you're a visual learner, you might prefer tuning into my Open Data Science Conference presentation, First Aid Kit for Data Science: Keeping Machine Learning Alive. In just over 24 minutes, I cover the machine-learning lifecycle, which includes finding the right data, preparing and exploring it and building, registering and reassigning models. I use a fraud detection project as an example.


Using CD with machine learning models to tackle fraud

#artificialintelligence

Credit card fraudsters are always changing their behavior, developing new tactics. For banks, the damage isn't just financial; their reputations are also on the line. So how do banks stay ahead of the crooks? For many, detection algorithms are essential. Given enough data, a supervised machine learning model can learn to detect fraud in new credit card applications. This model will give each application a score -- typically between 0 and 1 -- to indicate the likelihood that it's fraudulent. The banks can then set a threshold for which they regard an application as fraudulent or not -- typically that threshold will enable the bank to keep false positives and false negatives at a level it finds acceptable. False positives are the genuine applications that have been mistaken as fraud; false negatives are the fraudulent applications that are missed.


Boosting your Machine Learning productivity with SAS Viya

#artificialintelligence

I started my MSc Business Analytics course at theh University of Surrey almost one year ago. I had no prior experience in Machine Learning or data science. Before, I used to develop and manage EU projects for businesses, local authorities and non-profit organisations. I even achieved two International awards for best project. However, I wanted to immerse in the technology field and be part of the great community which enhance business by developing the products and services of the future.


A mixed model approach to drought prediction using artificial neural networks: Case of an operational drought monitoring environment

Adede, Chrisgone, Oboko, Robert, Wagacha, Peter, Atzberger, Clement

arXiv.org Machine Learning

Droughts, with their increasing frequency of occurrence, continue to negatively affect livelihoods and elements at risk. For example, the 2011 in drought in east Africa has caused massive losses document to have cost the Kenyan economy over $12bn. With the foregoing, the demand for ex-ante drought monitoring systems is ever-increasing. The study uses 10 precipitation and vegetation variables that are lagged over 1, 2 and 3-month time-steps to predict drought situations. In the model space search for the most predictive artificial neural network (ANN) model, as opposed to the traditional greedy search for the most predictive variables, we use the General Additive Model (GAM) approach. Together with a set of assumptions, we thereby reduce the cardinality of the space of models. Even though we build a total of 102 GAM models, only 21 have R2 greater than 0.7 and are thus subjected to the ANN process. The ANN process itself uses the brute-force approach that automatically partitions the training data into 10 sub-samples, builds the ANN models in these samples and evaluates their performance using multiple metrics. The results show the superiority of 1-month lag of the variables as compared to longer time lags of 2 and 3 months. The champion ANN model recorded an R2 of 0.78 in model testing using the out-of-sample data. This illustrates its ability to be a good predictor of drought situations 1-month ahead. Investigated as a classifier, the champion has a modest accuracy of 66% and a multi-class area under the ROC curve (AUROC) of 89.99%


Next Generation Automated Machine Learning (AML)

@machinelearnbot

Summary: Automated Machine Learning has only been around for a little over two years and already there are over 20 providers in this space. However, a new European AML platform called Tazi, new in the US, is showing what the next generation of AML will look like. I've been a follower and a fan of Automated Machine Learning (AML) since it first appeared in the market about two years ago. I wrote an article on all five of the market entrants I could find at the time under the somewhat scary title'Data Scientists Automated and Unemployed by 2025!'. As time passed I tried to keep up with the new entrants.


Automated Predictive Analytics – What Could Possibly Go Wrong?

@machinelearnbot

Much of that is in data cleansing, normalizing, removing skewness, transforming data for specific algorithm requirements, and even running multiple algorithms in parallel to determine champion models. So long as we are talking about things like removing skewness, or normalizing data required for specific algorithms (e.g. Feature selection is fairly straightforward to automate (leaving the creative feature engineering issues aside). I wouldn't mind seeing that sort of comparative data published for all advanced analytic platforms, keeping in mind that two data scientists using the same platform can come up with different results.